Focused Web Crawling for E-Learning Content

نویسنده

  • Pabitra Mitra
چکیده

The work describes the design of the focused crawler for Intinno, an intelligent web based content management system. Intinno system aims to circumvent the drawbacks of existing learning management systems in terms of scarcity of content which often leads to the cold start problem. The scarcity problem is solved by using a focused crawler to mine educational content from the web. Educational content is mined from University websites in the form of course pages. We present a survey of various probabilistic models such as Hidden Markov Models(HMMs) and Conditional Random Fields(CRFs) for building a focused crawler and finally we describe the design of the system by applying CRFs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Focused Crawling for Retrieving E-Commerce Information

Focused crawling is proposed to selectively seek out pages that are relevant to a predefined set of topics without downloading all pages of the Web. With the rapid growth of the E-commerce, how to discovery the specific information such as about buyer, seller and products etc. adapting for the online business user becomes a focused issue to the information search engine. We present a novel sema...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

StumbleUpon Evergreen Classification Challenge (Website Classification Problem)

Web classification is a very important machine learning problem with wide applicability in tasks such as news classification, content prioritization, focused crawling and sentiment analysis of web content. In this project, we primarily focus on developing prediction model using machine learning techniques for one such problem that classifies if a web posting is of eternal relevance, known as ev...

متن کامل

Exploiting Multiple Features with MEMMs for Focused Web Crawling

Focused web crawling traverses theWeb to collect documents on a specific topic. This is not an easy task, since focused crawlers need to identify the next most promising link to follow based on the topic and the content and links of previously crawled pages. In this paper, we present a framework based on Maximum Entropy Markov Models(MEMMs) for an enhanced focused web crawler to take advantage ...

متن کامل

A New Approach Towards Vertical Search Engines - Intelligent Focused Crawling and Multilingual Semantic Techniques

Search engines typically consist of a crawler which traverses the web retrieving documents and a search frontend which provides the user interface to the acquired information. Focused crawlers refine the crawler by intelligently directing it to predefined topic areas. The evolution of search engines today is expedited by supplying more search capabilities such as a search for metadata as well a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008